Search CORE

18 research outputs found

Introducción al procesamiento del habla mediante técnicas de deep learning

Author: López Zorrilla Asier
Publication venue
Publication date: 24/06/2016
Field of study

A lo largo de este trabajo se estudiarán de forma teórica la arquitectura de los reconocedores de voz basados en modelos generativos. En concreto se analizarán dos sistemas distintos: los sistemas basados en modelos ocultos de Markov y mezclas de gaussianas, y los modelos híbridos entre modelos ocultos de Markov y redes neuronales. Para ello se comenzará realizando una introducción al problema del recono- cimiento de voz. Después se analizarán de forma general modelos de mezclas de gaussianas, los modelos ocultos de Markov y las redes neuronales. Finalmente se presentará la herramienta Kaldi, con la cuál se realizarán diversos experimentos para comparar y analizar las características de los distintos sistemas de reconocimiento de voz. En particular, nos centraremos en estudiar el comportamiento de las redes neuronales

Archivo Digital para la Docencia y la Investigación

A Differentiable Generative Adversarial Network for Open Domain Dialogue

Author: De Velasco Vázquez Mikel
López Zorrilla Asier
Torres Barañano María Inés
Publication venue
Publication date: 01/04/2019
Field of study

Paper presented at the IWSDS 2019: International Workshop on Spoken Dialogue Systems Technology, Siracusa, Italy, April 24-26, 2019This work presents a novel methodology to train open domain neural dialogue systems within the framework of Generative Adversarial Networks with gradient-based optimization methods. We avoid the non-differentiability related to text-generating networks approximating the word vector corresponding to each generated token via a top-k softmax. We show that a weighted average of the word vectors of the most probable tokens computed from the probabilities resulting of the top-k softmax leads to a good approximation of the word vector of the generated token. Finally we demonstrate through a human evaluation process that training a neural dialogue system via adversarial learning with this method successfully discourages it from producing generic responses. Instead it tends to produce more informative and variate ones.This work has been partially funded by the Basque Government under grant PRE_2017_1_0357, by the University of the Basque Country UPV/EHU under grant PIF17/310, and by the H2020 RIA EMPATHIC (Grant N: 769872)

Archivo Digital para la Docencia y la Investigación

A multilingual neural coaching model with enhanced long-term dialogue structure

Author: López Zorrilla Asier
Torres Barañano María Inés
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/07/2022
Field of study

In this work we develop a fully data-driven conversational agent capable of carrying out motivational coach- ing sessions in Spanish, French, Norwegian, and English. Unlike the majority of coaching, and in general well-being related conversational agents that can be found in the literature, ours is not designed by hand- crafted rules. Instead, we directly model the coaching strategy of professionals with end users. To this end, we gather a set of virtual coaching sessions through a Wizard of Oz platform, and apply state of the art Natural Language Processing techniques. We employ a transfer learning approach, pretraining GPT2 neural language models and fine-tuning them on our corpus. However, since these only take as input a local dialogue history, a simple fine-tuning procedure is not capable of modeling the long-term dialogue strategies that appear in coaching sessions. To alleviate this issue, we first propose to learn dialogue phase and scenario embeddings in the fine-tuning stage. These indicate to the model at which part of the dialogue it is and which kind of coaching session it is carrying out. Second, we develop a global deep learning system which controls the long-term structure of the dialogue. We also show that this global module can be used to visualize and interpret the decisions taken by the the conversational agent, and that the learnt representations are comparable to dialogue acts. Automatic and human evaluation show that our proposals serve to improve the baseline models. Finally, interaction experiments with coaching experts indicate that the system is usable and gives rise to positive emotions in Spanish, French and English, while the results in Norwegian point out that there is still work to be done in fully data driven approaches with very low resource languages.This work has been partially funded by the Basque Government under grant PRE_2017_1_0357 and by the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 769872

Archivo Digital para la Docencia y la Investigación

Audio Embeddings help to learn better dialogue policies

Author: Cuayáhuitl Heriberto
López Zorrilla Asier
Torres Barañano María Inés
Publication venue
Publication date: 01/12/2021
Field of study

Presentado en ASRU 2021, Cartagena (Colombia), 13-17 diciembre 2021Neural transformer architectures have gained a lot of interest for text-based dialogue management in the last few years. They have shown high learning capabilities for open domain dialogue with huge amounts of data and also for domain adaptation in task-oriented setups. But the potential benefits of exploiting the users’ audio signal have rarely been ex- plored in such frameworks. In this work, we combine text dialogue history representations generated by a GPT-2 model with audio embeddings obtained by the recently released Wav2Vec2 transformer model. We jointly fine-tune these models to learn dialogue policies via supervised learning and two policy gradient-based reinforcement learning algorithms. Our experimental results, using the DSTC2 dataset and a sim- ulated user model capable of sampling audio turns, reveal that audio embeddings lead to overall higher task success (than without using audio embeddings) with statistically significant results across evaluation metrics and training algorithms

Archivo Digital para la Docencia y la Investigación

Corrective Focus Detection in Italian Speech Using Neural Networks

Author: Cenceschi Sonia
De Velasco Vázquez Mikel
López Zorrilla Asier
Torres Barañano María Inés
Publication venue: 'Obuda University'
Publication date: 01/01/2018
Field of study

The corrective focus is a particular kind of prosodic prominence where the speaker is intended to correct or to emphasize a concept. This work develops an Artificial Cognitive System (ACS) based on Recurrent Neural Networks that analyzes suitablefeatures of the audio channel in order to automatically identify the Corrective Focus on speech signals. Two different approaches to build the ACS have been developed. The first one addresses the detection of focused syllables within a given Intonational Unit whereas the second one identifies a whole IU as focused or not. The experimental evaluation over an Italian Corpus has shown the ability of the Artificial Cognitive System to identify the focus in the speaker IUs. This ability can lead to further important improvements in human-machine communication. The addressed problem is a good example of synergies between Humans and Artificial Cognitive Systems.The research leading to the results in this paper has been conducted in the project EMPATHIC (Grant N: 769872) that received funding from the European Union’s Horizon2020 research and innovation programme.Additionally, this work has been partially funded by the Spanish Minister of Science under grants TIN2014-54288-C4-4-R and TIN2017-85854-C4-3-R, by the Basque Government under grant PRE_2017_1_0357,andby the University of the Basque Country UPV/EHU under grantPIF17/310

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital para la Docencia y la Investigación

Dialogue Management and Language Generation for a Robust Conversational Virtual Coach: Validation and User Study

Author: López Zorrilla Asier
Olaso Fernández Javier Mikel
Torres Barañano María Inés
Vázquez Risco Alain
Publication venue: 'MDPI AG'
Publication date: 01/01/2023
Field of study

Designing human–machine interactive systems requires cooperation between different disciplines is required. In this work, we present a Dialogue Manager and a Language Generator that are the core modules of a Voice-based Spoken Dialogue System (SDS) capable of carrying out challenging, long and complex coaching conversations. We also develop an efficient integration procedure of the whole system that will act as an intelligent and robust Virtual Coach. The coaching task significantly differs from the classical applications of SDSs, resulting in a much higher degree of complexity and difficulty. The Virtual Coach has been successfully tested and validated in a user study with independent elderly, in three different countries with three different languages and cultures: Spain, France and Norway.The research presented in this paper has been conducted as part of the project EMPATHIC that has received funding from the European Union’s Horizon 2020 research and innovation programme under Grant No. 769872. Additionally, this work has been partially funded by projects BEWORD and AMIC-PC of the Minister of Science of Technology, under Grant Nos. PID2021-126061OB-C42 and PDC2021-120846-C43, respectively. Vázquez and López Zorrilla received a PhD scholarship from the Basque Government, with Grant Nos. PRE 2020 1 0274 and PRE 2017 1 0357, respectively

Directory of Open Access Journals

Archivo Digital para la Docencia y la Investigación

Can Spontaneous Emotions be Detected from Speech on TV Political Debates?

Author: De Velasco Vázquez Mikel
Justo Blanco Raquel
López Zorrilla Asier
Torres Barañano María Inés
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Accepted paperDecoding emotional states from multimodal signals is an increasingly active domain, within the framework of affective computing, which aims to a better understanding of Human-Human Communication as well as to improve Human- Computer Interaction. But the automatic recognition of sponta- neous emotions from speech is a very complex task due to the lack of a certainty of the speaker states as well as to the difficulty to identify a variety of emotions in real scenarios. In this work we explore the extent to which emotional states can be decoded from speech signals extracted from TV political debates. The labelling procedure was supported by perception experiments where only a small set of emotions has been identified. In addition, some scaled judgements of valence, arousal and dominance were also provided. In this framework the paper shows meaningful comparisons between both, the dimensional and the categorical models of emotions, which is a new con- tribution when dealing with spontaneous emotions. To this end Support Vector Machines (SVM) as well as Feedforward Neural Networks (FNN) have been proposed to develop classifiers and predictors. The experimental evaluation over a Spanish corpus has shown the ability of both models to be identified in speech segments by the proposed artificial systems.This work has been partially funded by the Spanish Government under grant TIN2017-85854-C4-3-R (AEI/FEDER,UE) and conducted in the project EMPATHIC (Grant n769872) funded by the European Union’s H2020 research andinnovation program

Crossref

Archivo Digital para la Docencia y la Investigación

Iruzurrezko portaeren detekzioa crowd motako etiketazioan

Author: De Velasco Vázquez Mikel
Justo Blanco Raquel
López Zorrilla Asier
Publication venue: 'Udako Euskal Unibertsitatea'
Publication date: 01/05/2019
Field of study

This work aims at detecting low quality labels in crowdsourcing annotation tasks. We validate our proposal carrying out experiments in a difficult and subjective task: emotion recognition. We have developed several measures in order to detect fraudulent behaviour, including measures related to the labelling time, worker inter-agreement and the distribution of the answers. Not only do we show that each of the described measures is helpful but we also demonstrate that mixing them is the best way to go.Lan honek crowd motako etiketazioan agertu daitezkeen kalitate baxuko etiketak detektatzea du helburu. Proposatutako metodologia balioztatzeko, saiakuntzak ataza zail eta subjektibo batekin egin ditugu: emozioen de- tekzioarekin. Iruzurrezko langileak topatzeko zenbait neurri proposatu dira, etiketatze denboran, langileen arteko adostasunean eta langileen erantzunen banaketan oinarriturikoak. Neurri bakoitza baliagarria dela frogatu dugun arren, gure ondorio nagusia neurriak batzerakoan iruzurrezko langileak detektatzeko probabilitatea handitzen dela da.Egileok gure esker ona adierazi nahiko genioke Euskal Herriko Unibertsitateari, Espainako gobernuako TIN2017- 85854-C4-3-R zenbakidun diru laguntzari eta H2020 Europako Batzordeko SC1-PM15 programako RIA 7 deial- diko 769872 zenbakidun laguntzari, hurrenez hurren, ikerketa hau babesteagatik

Archivo Digital para la Docencia y la Investigación

Euskaraz hitz egiten ikasten duten makina autodidaktak

Author: De Velasco Vázquez Mikel
Justo Blanco Raquel
López Zorrilla Asier
Publication venue: 'Udako Euskal Unibertsitatea'
Publication date: 01/05/2019
Field of study

Lan honetan sare neuronalen bidez euskaraz hitz egiten ikasten duen elkarrizketa sistema automatikoa aurkezten dugu. Horretarako, Turingen testaren ideia era konputazionalean inplementatzen duten sare neuronal sortzaile aurkariak erabili ditugu. Normalean erabiltzen diren ingelesezko corpusak baino bi magnitude ordena txikiagoa den euskarazko corpus batekin halako sareak entrenatzea badagoela frogatzen dugu. Amaitzeko, euskararen morfologia kontuan hartzen duen aurreprozesamendua erabiltzea komenigarria dela erakusten dugu. Dakigunaren arabera, sare neuronaletan oinarrituta dagoen euskarazko lehen elkarrizketa sistema aurkezten dugu.Lan honen egileok gure esker ona adierazi nahiko genioke Eusko Jaurlaritzari, Euskal Herriko Unibertsitateari eta baita Europar Batzordeari, PRE 2017 1 0357 eta PIF17/310 zenbakidun diru laguntzekin, eta H2020 SC1-PM15 programako RIA 7 deialdiko 769872 zenbakidun diru laguntzarekin, hurrenez hurren, ikerketa hau babesteagatik

Archivo Digital para la Docencia y la Investigación

A Spanish Corpus for Talking to the Elderly

Author: Ben Letaifa Zouari Leila
De Velasco Vázquez Mikel
Justo Blanco Raquel
López Zorrilla Asier
Olaso Javier Mikel
Torres Barañano María Inés
Vázquez Risco Alain
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/10/2020
Field of study

Paper presented at 11th International Workshop on Spoken Dialogue Systems, IWSDS 2020; Madrid; Spain; 21 September 2020 through 23 September 2020In this work, a Spanish corpus that was developed, within the EMPATHIC project (http://www.empathic-project.eu/) framework, is presented. It was designed for building a dialogue system capable of talking to elderly people and promoting healthy habits, through a coaching model. The corpus, that comprises audio, video an text channels, was acquired by using a Wizard of Oz strategy. It was annotated in terms of different labels according to the different models that are needed in a dialogue system, including an emotion based annotation that will be used to generate empathetic system reactions. The annotation at different levels along with the employed procedure are described and analysed

Crossref

Archivo Digital para la Docencia y la Investigación